185 research outputs found

    Update of the G2D tool for prioritization of gene candidates to inherited diseases

    Get PDF
    G2D (genes to diseases) is a web resource for prioritizing genes as candidates for inherited diseases. It uses three algorithms based on different prioritization strategies. The input to the server is the genomic region where the user is looking for the disease-causing mutation, plus an additional piece of information depending on the algorithm used. This information can either be the disease phenotype (described as an online Mendelian inheritance in man (OMIM) identifier), one or several genes known or suspected to be associated with the disease (defined by their Entrez Gene identifiers), or a second genomic region that has been linked as well to the disease. In the latter case, the tool uses known or predicted interactions between genes in the two regions extracted from the STRING database. The output in every case is an ordered list of candidate genes in the region of interest. For the first two of the three methods, the candidate genes are first retrieved through sequence homology search, then scored accordingly to the corresponding method. This means that some of them will correspond to well-known characterized genes, and others will overlap with predicted genes, thus providing a wider analysis. G2D is publicly available at http://www.ogic.ca/projects/g2d_2

    Outer membrane pore protein prediction in mycobacteria using genomic comparison

    Get PDF
    Proteins responsible for outer membrane transport across the unique membrane structure of Mycobacterium spp. are attractive drug targets in the treatment of human diseases caused by the mycobacterial pathogens, M. tuberculosis, M. bovis, M. leprae and M. ulcerans. In contrast to E. coli, relatively few outer membrane proteins (OMPs) have been identified in Mycobacterium spp., largely due to the difficulties in isolating mycobacterial membrane proteins and our incomplete understanding of secretion mechanisms and cell wall structure in these organisms. To further expand our knowledge of these elusive proteins in Mycobacterium, we have improved upon our previous method of OMP prediction in mycobacteria by taking advantage of genomic data from seven mycobacteria species. Our improved algorithm suggests 4333 sequences as putative OMPs in these seven species with varying degrees of confidence. The most virulent pathogenic mycobacterial species are slightly enriched in these selected sequences. We present examples of predicted OMPs involved in horizontal transfer and paralogy expansion. Analysis of local secondary structure content allowed identifying small domains predicted to perform as OMPs; some examples show their involvement in events of tandem duplication and domain rearrangements. We discuss the taxonomic distribution of these discovered families and architectures, often specific to mycobacteria or the wider taxonomic class of Actinobacteria. Our results suggest that OMP functionality in mycobacteria is richer than expected and provide a resource to guide future research of these understudied proteins

    A method for cell type marker discovery by high-throughput gene expression analysis of mixed cell populations

    Get PDF
    BACKGROUND: Gene transcripts specifically expressed in a particular cell type (cell-type specific gene markers) are useful for its detection and isolation from a tissue or other cell mixtures. However, finding informative marker genes can be problematic when working with a poorly characterized cell type, as markers can only be unequivocally determined once the cell type has been isolated. We propose a method that could identify marker genes of an uncharacterized cell type within a mixed cell population, provided that the proportion of the cell type of interest in the mixture can be estimated by some indirect method, such as a functional assay. RESULTS: We show that cell-type specific gene markers can be identified from the global gene expression of several cell mixtures that contain the cell type of interest in a known proportion by their high correlation to the concentration of the corresponding cell type across the mixtures. CONCLUSIONS: Genes detected using this high-throughput strategy would be candidate markers that may be useful in detecting or purifying a cell type from a particular biological context. We present an experimental proof-of-concept of this method using cell mixtures of various well-characterized hematopoietic cell types, and we evaluate the performance of the method in a benchmark that explores the requirements and range of validity of the approach

    Functional and genomic analyses of α-solenoid proteins

    Get PDF
    {alpha}-solenoids are flexible protein structural domains formed by ensembles of alpha-helical repeats (Armadillo and HEAT repeats among others). While homology can be used to detect many of these repeats, some {alpha}-solenoids have very little sequence homology to proteins of known structure and we expect that many remain undetected. We previously developed a method for detection of {alpha}-helical repeats based on a neural network trained on a dataset of protein structures. Here we improved the detection algorithm and updated the training dataset using recently solved structures of {alpha}-solenoids. Unexpectedly, we identified occurrences of {alpha}-solenoids in solved protein structures that escaped attention, for example within the core of the catalytic subunit of PI3KC. Our results expand the current set of known {alpha}-solenoids. Application of our tool to the protein universe allowed us to detect their significant enrichment in proteins interacting with many proteins, confirming that {alpha}-solenoids are generally involved in protein-protein interactions. We then studied the taxonomic distribution of {alpha}-solenoids to discuss an evolutionary scenario for the emergence of this type of domain, speculating that {alpha}-solenoids have emerged in multiple taxa in independent events by convergent evolution. We observe a higher rate of {alpha}-solenoids in eukaryotic genomes and in some prokaryotic families, such as Cyanobacteria and Planctomycetes, which could be associated to increased cellular complexity. The method is available at http://cbdm.mdc-berlin.de/~ard2/

    PubFocus: semantic MEDLINE/PubMed citations analytics through integration of controlled biomedical dictionaries and ranking algorithm

    Get PDF
    BACKGROUND: Understanding research activity within any given biomedical field is important. Search outputs generated by MEDLINE/PubMed are not well classified and require lengthy manual citation analysis. Automation of citation analytics can be very useful and timesaving for both novices and experts. RESULTS: PubFocus web server automates analysis of MEDLINE/PubMed search queries by enriching them with two widely used human factor-based bibliometric indicators of publication quality: journal impact factor and volume of forward references. In addition to providing basic volumetric statistics, PubFocus also prioritizes citations and evaluates authors' impact on the field of search. PubFocus also analyses presence and occurrence of biomedical key terms within citations by utilizing controlled vocabularies. CONCLUSION: We have developed citations' prioritisation algorithm based on journal impact factor, forward referencing volume, referencing dynamics, and author's contribution level. It can be applied either to the primary set of PubMed search results or to the subsets of these results identified through key terms from controlled biomedical vocabularies and ontologies. NCI (National Cancer Institute) thesaurus and MGD (Mouse Genome Database) mammalian gene orthology have been implemented for key terms analytics. PubFocus provides a scalable platform for the integration of multiple available ontology databases. PubFocus analytics can be adapted for input sources of biomedical citations other than PubMed

    Hydration dynamics at fluorinated protein surfaces

    Get PDF
    Water-protein interactions dictate many processes crucial to protein function including folding, dynamics, interactions with other biomolecules, and enzymatic catalysis. Here we examine the effect of surface fluorination on water-protein interactions. Modification of designed coiled-coil proteins by incorporation of 5,5,5-trifluoroleucine or (4S)-2-amino-4-methylhexanoic acid enables systematic examination of the effects of side-chain volume and fluorination on solvation dynamics. Using ultrafast fluorescence spectroscopy, we find that fluorinated side chains exert electrostatic drag on neighboring water molecules, slowing water motion at the protein surface

    Selection of oligonucleotides for whole-genome microarrays with semi-automatic update

    Get PDF
    Summary: Oligonucleotide microarray probes are designed to match specific transcripts present in databases that are regularly updated. As a consequence probes should be checked every new database release. We thus developed an informatics tool allowing the semi-automatic update of probe collections of long oligonucleotides and applied it to the mouse RefSeq database

    Genes to Diseases (G2D) Computational Method to Identify Asthma Candidate Genes

    Get PDF
    Asthma is a complex trait for which different strategies have been used to identify its environmental and genetic predisposing factors. Here, we describe a novel methodological approach to select candidate genes for asthma genetic association studies. In this regard, the Genes to Diseases (G2D) computational tool has been used in combination with a genome-wide scan performed in a sub-sample of the Saguenay−Lac-St-Jean (SLSJ) asthmatic familial collection (n = 609) to identify candidate genes located in two suggestive loci shown to be linked with asthma (6q26) and atopy (10q26.3), and presenting differential parent-of-origin effects. This approach combined gene selection based on the G2D data mining analysis of the bibliographic and protein public databases, or according to the genes already known to be associated with the same or a similar phenotype. Ten genes (LPA, NOX3, SNX9, VIL2, VIP, ADAM8, DOCK1, FANK1, GPR123 and PTPRE) were selected for a subsequent association study performed in a large SLSJ sample (n = 1167) of individuals tested for asthma and atopy related phenotypes. Single nucleotide polymorphisms (n = 91) within the candidate genes were genotyped and analysed using a family-based association test. The results suggest a protective association to allergic asthma for PTPRE rs7081735 in the SLSJ sample (p = 0.000463; corrected p = 0.0478). This association has not been replicated in the Childhood Asthma Management Program (CAMP) cohort. Sequencing of the regions around rs7081735 revealed additional polymorphisms, but additional genotyping did not yield new associations. These results demonstrate that the G2D tool can be useful in the selection of candidate genes located in chromosomal regions linked to a complex trait

    BICEPP: an example-based statistical text mining method for predicting the binary characteristics of drugs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The identification of drug characteristics is a clinically important task, but it requires much expert knowledge and consumes substantial resources. We have developed a statistical text-mining approach (BInary Characteristics Extractor and biomedical Properties Predictor: BICEPP) to help experts screen drugs that may have important clinical characteristics of interest.</p> <p>Results</p> <p>BICEPP first retrieves MEDLINE abstracts containing drug names, then selects tokens that best predict the list of drugs which represents the characteristic of interest. Machine learning is then used to classify drugs using a document frequency-based measure. Evaluation experiments were performed to validate BICEPP's performance on 484 characteristics of 857 drugs, identified from the Australian Medicines Handbook (AMH) and the PharmacoKinetic Interaction Screening (PKIS) database. Stratified cross-validations revealed that BICEPP was able to classify drugs into all 20 major therapeutic classes (100%) and 157 (of 197) minor drug classes (80%) with areas under the receiver operating characteristic curve (AUC) > 0.80. Similarly, AUC > 0.80 could be obtained in the classification of 173 (of 238) adverse events (73%), up to 12 (of 15) groups of clinically significant cytochrome P450 enzyme (CYP) inducers or inhibitors (80%), and up to 11 (of 14) groups of narrow therapeutic index drugs (79%). Interestingly, it was observed that the keywords used to describe a drug characteristic were not necessarily the most predictive ones for the classification task.</p> <p>Conclusions</p> <p>BICEPP has sufficient classification power to automatically distinguish a wide range of clinical properties of drugs. This may be used in pharmacovigilance applications to assist with rapid screening of large drug databases to identify important characteristics for further evaluation.</p

    The Barcode of Life Data Portal: Bridging the Biodiversity Informatics Divide for DNA Barcoding

    Get PDF
    With the volume of molecular sequence data that is systematically being generated globally, there is a need for centralized resources for data exploration and analytics. DNA Barcode initiatives are on track to generate a compendium of molecular sequence–based signatures for identifying animals and plants. To date, the range of available data exploration and analytic tools to explore these data have only been available in a boutique form—often representing a frustrating hurdle for many researchers that may not necessarily have resources to install or implement algorithms described by the analytic community. The Barcode of Life Data Portal (BDP) is a first step towards integrating the latest biodiversity informatics innovations with molecular sequence data from DNA barcoding. Through establishment of community driven standards, based on discussion with the Data Analysis Working Group (DAWG) of the Consortium for the Barcode of Life (CBOL), the BDP provides an infrastructure for incorporation of existing and next-generation DNA barcode analytic applications in an open forum
    corecore